Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract data from Wikipedia and Wikidata*

    Hello,
    I need to extract data from Wikipedia and wikidata
    Does any one know how to do that in Stata?
    I use Stata 15 and stata 14

    Thanks

  • #2
    You are very precise about the tool you are about to use, yet very vague about the problem you need to solve, akin to "How do I do economic research with a 14-inch Phillips screwdriver?" Instead:
    • if the problem has already been solved in other languages, perhaps, you could point to an example code and transform your problem to "How do I port this python code, which reads Wikipedia data to an equivalent Stata code?"
    • if the problem is unique and the data provider does not provide any example code in any language, there is probably some documentation, which explains what is the format of the data; that would be a valuable addition to your question; for example: "The data comes in the form of an XML file with attributes of articles defined as per ...document name/link... I need to extract attributes 'year' and 'author' for every article".
    • Wikipedia data may be similar to Geekipedia, Chickipedia, or Otherpedia data; point to existing solutions for such a different site then.
    • finally, if there is no description, documentation or example code, perhaps pointing to an example file/URL would be the last resort solution (though an example would help in every of the above cases as well).
    In the absence of these details in your post, here are some generic solutions that you may want to try:
    1. Contact the data producer and ask them to send you data in Stata format.
    2. Hire a consultant who will do this work for you (he/she might very well ask you the same questions as above).
    PS: Geekipedia, Chickipedia, and Otherpedia are fictitious names. Any relationship to real sites or products is purely coincidental.

    Comment

    Working...
    X